MatchZoo: A Toolkit for Deep Text Matching
نویسندگان
چکیده
In recent years, deep neural models have been widely adopted for text matching tasks, such as question answering and information retrieval, showing improved performance as compared with previous methods. In this paper, we introduce the MatchZoo toolkit that aims to facilitate the designing, comparing and sharing of deep text matching models. Specically, the toolkit provides a unied data preparation module for dierent text matching problems, a exible layer-based model construction process, and a variety of training objectives and evaluation metrics. In addition, the toolkit has implemented two schools of representative deep text matching models, namely representation-focused models and interactionfocused models. Finally, users can easily modify existing models, create and share their own models for text matching in MatchZoo.
منابع مشابه
Deep Fusion LSTMs for Text Semantic Matching
Recently, there is rising interest in modelling the interactions of text pair with deep neural networks. In this paper, we propose a model of deep fusion LSTMs (DF-LSTMs) to model the strong interaction of text pair in a recursive matching way. Specifically, DF-LSTMs consist of two interdependent LSTMs, each of which models a sequence under the influence of another. We also use external memory ...
متن کاملSemantic Annotation, Analysis and Comparison: A Multilingual and Cross-lingual Text Analytics Toolkit
Within the context of globalization, multilinguality and cross-linguality for information access have emerged as issues of major interest. In order to achieve the goal that users from all countries have access to the same information, there is an impending need for systems that can help in overcoming language barriers by facilitating multilingual and cross-lingual access to data. In this paper,...
متن کاملThe Joy of Parallelism with CzEng 1.0
CzEng 1.0 is an updated release of our Czech-English parallel corpus, freely available for non-commercial research or educational purposes. In this release, we approximately doubled the corpus size, reaching 15 million sentence pairs (about 200 million tokens per language). More importantly, we carefully filtered the data to reduce the amount of non-matching sentence pairs. CzEng 1.0 is automat...
متن کاملMultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity
We present MultiGranCNN, a general deep learning architecture for matching text chunks. MultiGranCNN supports multigranular comparability of representations: shorter sequences in one chunk can be directly compared to longer sequences in the other chunk. MultiGranCNN also contains a flexible and modularized match feature component that is easily adaptable to different types of chunk matching. We...
متن کاملFamilia: An Open-Source Toolkit for Industrial Topic Modeling
Familia is an open-source toolkit for pragmatic topic modeling in industry. Familia abstracts the utilities of topic modeling in industry as two paradigms: semantic representation and semantic matching. Efficient implementations of the two paradigms are made publicly available for the first time. Furthermore, we provide off-the-shelf topic models trained on large-scale industrial corpora, inclu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1707.07270 شماره
صفحات -
تاریخ انتشار 2017